Adapting hierarchical clustering distance measures for improved presentation of relationships between transaction elements
نویسندگان
چکیده
Common goal of descriptive data mining techniques is presenting new information in concise, easily interpretable and understandable ways. Hierarchical clustering technique for example enables simple visualization of distances between analyzed objects or attributes. However, common distance measures used by existing data mining tools are usually not well suited for analyzing transactional data using this particular technique. Including new types of measures specifically aimed at transactional data can make hierarchical clustering a much more feasible choice for transactional data analysis. This paper presents and analyzes convenient measure types, providing methods of transforming them to represent distances between transaction elements more appropriately. Developed measures are implemented, verified and compared in hierarchical clustering analysis on both artificial data as well as referent transactional datasets.
منابع مشابه
Comprehensive Survey on Distance / Similarity Measures between Probability Density Functions
Distance or similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various distance/similarity measures that are applicable to compare two probability density functions, pdf in short, are reviewed and categorized in both syntactic and semantic relationships. A correlation coefficient and a hierarchical clustering ...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملTaxonomy of Nominal Type Histogram Distance Measures
Abstract: Distance or similarity measures are of fundamental importance to pattern classification, clustering, and information retrieval problems. Various distance/similarity measures that are applicable to compare two nominal type histograms are reviewed and categorized in both syntactic and semantic relationships. A correlation coefficient and a hierarchical clustering technique are adopted t...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملNew distance and similarity measures for hesitant fuzzy soft sets
The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...
متن کامل